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ABSTRACT 

EcoCyc {http://EcoCyc.org) is a model organism 
database built on the genome sequence of 
Escherichia coli K-12 MG1655. Expert manual 
curation of the functions of individual E. coli gene 
products in EcoCyc has been based on information 
found in the experimental literature for E. coli 
K-12-derived strains. Updates to EcoCyc content 
continue to improve the comprehensive picture of 
E. coli biology. The utility of EcoCyc is enhanced 
by new tools available on the EcoCyc web site, 
and the development of EcoCyc as a teaching tool 
is increasing the impact of the knowledge collected 
in EcoCyc. 

OVERVIEW 

EcoCyc has a long history of capturing Escherichia coli 
biology. In 1994, expert manual curation of EcoCyc began 
by covering the area of metabolic pathways and enzymes. 
Since then, EcoCyc has evolved in both breadth and 
depth: it now incorporates the functional annotation of 
all gene products, including proteins and RNAs outside 
of metabohc pathways. Many new data types, such as 
evidence codes, signahng pathways, transcriptional and 
post-transcriptional regulation and Gene Ontology (GO) 
annotations, have been added by curators. Highhghts of 
our progress in updating EcoCyc content and the 



functions of E. coli gene products are described later 
and summarized in Table 1. 

We propose that a next step in the evolution of model 
organism databases (MODs), such as EcoCyc, is to 
become computational models of their respective organ- 
isms. We have generated a steady-state metabolic flux 
model from EcoCyc using the MetaFlux (1) implementa- 
tion of flux balance analysis (FBA), and we have used that 
model to predict the growth phenotype (growth or no 
growth) of E. coli under many different nutrient and 
gene knockout conditions. We are undertaking a 
long-term iterative effort to perform these computational 
predictions, to compare the computational results to ex- 
perimental results, and to investigate the differences 
between the two. We will update the metabolic reaction 
model within EcoCyc when warranted to resolve these 
differences, and we hope that our efforts will lead to 
new experimental investigations in cases where the pheno- 
typic observations cannot be explained. 

One merit of our proposed marriage of MODs with 
systems biology is to yield higher-quahty databases — in 
fact, it has already led to improvements in EcoCyc, such 
as the addition of previously overlooked reactions from 
the literature and the correction of reaction direction 
information. Subjecting a database to computational con- 
sistency checks can identify errors that manual analysis 
overlooks. Rather than scattering proposed model correc- 
tions across many publications, it is critical to integrate 
these corrections in a central resource to ensure their avail- 
ability to the scientific community in general, and to future 
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Table 1. EcoCyc content and E. coli gene product functions 



Data type Number 



Genes 4499 

Gene products covered by a mini-review 3706 

Gene products with GO terms with EXP evidence 2462 

Enzymes 1485 

Metabolic reactions 1577 

Compounds 2363 

Transporters 264 

Transport reactions 348 

Transported substrates 254 

Transcription factors 188 

Regulatory interactions 5827 

Transcription initiation 3207 

Transcription attenuation 20 

Regulation of translation 114 

Enzyme modulation 2468 

Other 18 

Literature citations 23 909 



modeling efforts in particular. Furthermore, general users 
of the database will benefit from the inclusion within the 
database of information required by the modeling effort, 
such as gene knockout phenotypes. 

A second merit of our approach is that it will yield a 
more efficient and transparent modeling effort. Metabolic 
models require carefully curated lists of reactions, of 
chemical structures and of gene-protein-reaction relation- 
ships, and by directly leveraging the results of EcoCyc 
curation in a modehng effort, we can greatly reduce the 
amount of model-specific curation that is needed. 
Modehng efforts also require large amounts of data for 
evaluating model correctness, such as growth assays of the 
organism under many different nutrient and gene knock- 
out conditions. Gathering, integrating and arbitrating 
among these data sets when they disagree can require sub- 
stantial effort and can be carried out effectively within a 
MOD project. Modeling also becomes more efficient if 
successive phases of modehng build on the model correc- 
tions formulated in earlier phases, which is simplified if all 
model corrections are aggregated in a central database. 
Interpretation of model results and debugging of model 
errors will be accelerated by the ability to quickly access 
information about gene-protein-reaction relationships, 
regulatory information, genome arrangements and 
known hterature about each gene. Interpretation of 
model results is speeded by computational tools such as 
the abihty to visualize the hundreds of reaction flux rates 
predicted by a metabolic model onto a complete metabolic 
map diagram. The same visualization tools also make 
these models more transparent to the larger scientific com- 
munity — rather than making model results available as 
large, cryptic spreadsheets and other data files, model 
results can be interpreted relative to web-accessible data- 
bases with powerful visualization tools. 

Thus, driven in part by metabolic modeling efforts and 
in part by the utility of genome-scale data sets to the larger 
E. coli community, another new direction for EcoCyc is 
the integration of multiple large-scale growth and gene 
essentiahty data sets into EcoCyc. 



EcoCyc is part of the BioCyc collection of organism- 
specific pathway/genome databases at http://BioCyc.org 
(2). Among the nearly 2000 BioCyc databases are >130 
databases for sequenced strains of E. coli and Shigella, 
including pathogenic, non-pathogenic, human micro- 
biome and laboratory strains. These databases were auto- 
matically built using the Pathway Tools software and were 
not human-curated. Leveraging the large amount of 
experimental data collected in EcoCyc, we have begun 
to transfer functional annotations of gene products from 
E. coli K-12 MG1655 to their orthologs in the closely 
related K-12 strain W3110 and the E. coli B strain 
REL606. This aUows us to focus manual curation efforts 
on the gene products and functions that differ between 
these strains. 



UPDATE ON EcoCyc DATA 

Update of transcriptional regulation data 

Regulation of transcription initiation in EcoCyc has been 
kept up to date with the experimental literature. Table 2 
summarizes the type and the number of regulation objects 
that are present in EcoCyc, as well as new objects added in 
the past 2 years. In addition, we have added missing 
evidence codes and literature citations to 210 promoters 
and to 85 regulatory interactions. All promoters and regu- 
latory interactions now have at least one evidence code 
and one reference. 

AUosteric regulation of RNA polymerase hy guanosine 
tetraphosphate ( ppGpp ) and DksA 

Regulation of transcription initiation goes beyond activa- 
tor and repressor proteins that bind to the chromosome 
near promoter sequences. We have started curation of the 
alarmone guanosine tetraphosphate (ppGpp or 'magic 
spot') and the small protein DksA, both capable of 
binding RNA polymerase and thereby negatively 
regulating transcriptional activity of ribosomal RNA 
and transfer RNA genes in response to nutritional stress 
(3,4). ppGpp and DksA stimulate expression of proteins 
required for amino acid biosynthesis and transport (5-7), 
and some CT^-dependent promoters (8) (Supplementary 
Figure SI). A total of 70 allosteric interactions have 
been curated; 29 are associated with ppGpp, 10 with 
DksA and 31 with both factors. 

Improved annotation of transcription factor binding 
sites (TFBSs) 

We continued updating and assigning the symmetry, 
length and consensus sequence for 130 transcription 
factors (TFs). As a consequence of this dedicated 
curation, we have relocated and reassigned TF binding 
sites (TFBSs) for 33 TFs and generated new regulatory 
interactions. 

We used different strategies to identify the properties of 
the TFBSs, performed manual ahgnments of the regions 
upstream of the genes regulated, compared orthologous 
intergenic regions and used the information from other 
databases, such as PRODORIC (9), RegPrecise (10), 
Tractor_DB (11) and FITBAR (12). In all cases, we also 
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Table 2. Types and numbers of EcoCyc regulation objects 





Total 


N^pw with 

I'l^W Willi 


N^pw with 

l^V-W WlLll 






Viitrh-pnnfiHpTipp 

111^11 CVJllllLl^ll^v^ 


pAmni 1 ta ti ATI a 1 

^V-'llllJLlLClllV-'llul 






experimental 


or low-confidence 






evidence 


experimental 








evidence 


Transcription units 


3473 


19 


48 


Promoters 


3766 


53 


1847 


Terminators 


251 


0 


12 


TPs" 


188 


11 


0 


TFBSs 


2701 


183 


144 


Regulatory interactions 


3207 


69 


412 



"TPs include DNA-binding TPs, as well as RNA polymerase-binding 
regulators. 



analyzed the available classical experimental evidence that 
corresponded to each regulatory interaction. All binding 
sites for a given TF were analyzed at the same time, using 
the biological knowledge of the mechanisms of action of 
TFs, preferential position and number of TFBSs, simple 
and complex regulation and families of regulators, among 
other aspects. 

Improved position weight matrices ( P WMs ) and 
computational predictions of TFBSs 

Computational predictions of TFBSs strongly rely on the 
quaUty of the position weight matrices (PWMs) that are 
used to scan regulatory regions, and on the threshold for 
selecting or rejecting TFBS predictions. To build a matrix, 
a minimum of four non-overlapping annotated binding 
sites is required for each TF. Medina-Rivera et al. (13) 
pubhshed a tool to assess the quality of matrices and 
define the appropriate score threshold. This tool has 
been used to evaluate and improve the matrices used to 
predict sites in E. coli K-12 regulatory regions. This evalu- 
ation, together with the continued detailed curation of 
TFBSs mentioned earlier and the increase in experimen- 
tally determined binding sites, has helped to increase the 
reliabihty of TFBS prediction. In 2010, a total of 11 522 
binding sites were predicted for 71 TFs, whereas in 2012, 
we predicted fewer sites, 8718 improved predictions, for a 
larger set of 83 TFs. The improved PWMs were used to 
curate a set of regulatory interactions that had no binding 
site identified despite having experimental evidence that 
supported them. Our current manual curation of the pre- 
dicted sites has identified TFBSs for 35 interactions. 

Computationally predicted promoters 

In addition to the curation of literature, we recently added 
1852 computationally predicted promoters. The predicted 
potential promoters contain sequences similar to those 
recognized by six of the seven known sigma factors in 
E. coli: CT^'*, cj^^ CT^^ a^"^ and a™. These predictions 
were generated by scanning 250-bp regions upstream of 
genes that lack reported promoters with PWMs for each 
sigma factor. PWMs and predictions for a™ housekeeping 
promoters were generated as reported by Huerta and 
Collado-Vides in 2003 (14). An updated version of this 
strategy was used to generate PWMs and predictions for 



all other sigma factors, except for a'^ as there is only 
one reported ct "-dependent promoter. 

Updates to the transport systems in EcoCyc 

Curation of the E. coli transport systems in EcoCyc 
focuses foremost on the addition of new or significant 
functional characterizations reported in the hterature. 
Since 2010, new transport functions have been assigned 
to nine previously uncharacterized membrane proteins 
(Supplementary Table SI). Motivated by the computa- 
tional analyses described earlier, extensive hterature 
searches resulted in the addition of transport reactions 
for a further 33 compounds in EcoCyc (Supplementary 
Table S2). 

First, a long-running project to assess the dead-end me- 
tabolites identified within EcoCyc (15) continues to yield 
valuable information regarding the transport capabilities 
of E. coli K-12. A more detailed description of this 
analysis is the subject of a separate article. 

Second, comparison of the experimental growth pheno- 
types recorded in EcoCyc with phenotypes predicted by 
computer modeling highlighted instances where transport 
reactions might be missing in EcoCyc. For example, E. 
coli K-12 is able to use pyruvate as a sole carbon and 
energy source (16), but no transport reaction for 
pyruvate was present in EcoCyc. A search of the hterature 
revealed further information regarding the energetics of 
pyruvate transport in E. coli K-12 (17), and a reaction 
representing the transport of pyruvate across the inner 
membrane was thus added to EcoCyc, although the cor- 
responding transporter is unknown. A total of 58 nutrient 
sources (carbon, nitrogen, sulfur or phosphorous) known 
to be capable of supporting growth were assessed in this 
way. Compounds for which transport reactions were 
added are shown in Supplementary Table S2. Literature 
references supporting the assertions of transport can be 
accessed via EcoCyc. 

Review of transport protein nomenclature 

EcoCyc has been curating transport proteins and trans- 
port reactions for many years, and over time, a variety of 
different curator approaches has resulted in transport 
protein nomenclature that was inconsistent and sometimes 
obscure. In many cases, the subunits of transport com- 
plexes were identified only by their gene name. Our aim 
in reviewing the nomenclature is to introduce a set of 
guidehnes that will enable curators to apply consistent 
informative names to transport proteins, transport 
complexes and their subunits. In doing so, we have 
taken into account the prokaryotic protein naming guide- 
hnes developed by UniProt and also the International 
Union of Biochemistry and Molecular Biology 
(lUBMB)-approved classification system for membrane 
transport proteins known as the Transporter 
Classification system (18). 

EcoCyc transport protein names are now indicative of 
substrate and transport energetics where possible. Gene 
names are not generally included, except in cases where 
more than one enzyme catalyses the same reaction. 
Transport class acronyms are retained if thought to be 
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widely recognizable (e.g. for ABC transporters and PTS 
permeases), but removed if less common. The individual 
subunits of transport complexes are named with the 
complex name followed by a specific subunit name. 
Table 3 lists examples of old and new transport protein 
names in EcoCyc that are illustrative of the improvements 
made. Approximately 450 individual transport proteins 
have been renamed in EcoCyc. 

INTEGRATION OF PHENOTYPE DATA SETS 

The full set of conditions that are suitable to sustain life 
for a bacterium is a fundamental collection of knowledge 
for that organism. Therefore, we have integrated data for 
18 individual growth media and large-scale respiration 
measurements from five phenotype microarray (PM) ex- 
periments [(19,20,21); B. Bochner and X. Lei, personal 
communication; A. Mackie and I. Paulsen, personal 
commmunication]. Each PM experiment assays respir- 
ation (which is often treated as a proxy for growth) in 
one or more of a set of four 96-well plates containing a 
large set of standardized nutrient mixtures, and each well 
is counted as one growth observation. We integrated 1422 
PM growth observations under aerobic conditions and 
190 growth observations under anaerobic conditions. 
Many differences among these PM observations were 
detected and will be discussed in detail in a separate 
publication. 

A summary of all growth conditions present in EcoCyc 
is available on the All Growth Media page, which can be 
retrieved through the command Search- > Growth Media 
and then clicking the button 'All Growth Media for 
this Organism'. The first table in this web page lists indi- 
vidual growth media; subsequent tables Ust data for PMs 
(Figure 1). A button above the first table allows the user to 
select aerobic versus anaerobic conditions and wild type 
versus mutant strains. The PM tables are color coded to 
indicate the degree of growth observed. When multiple 
observations are available for a given cell, the color of 
the cell is determined as follows. If all observations 
agree (e.g. all observations indicate growth), then the 
color of the cell indicates that growth level (e.g. growth). 
If the observations differ, but a curator has arbitrated 
among them and assigned a consensus result (e.g. for 
citrate, no growth was observed for most observations, 
and no growth was also observed by low-throughput ex- 
periments in the literature), the overall color of the cell 



reflects that consensus (no growth), but a small grid 
within the cell shows the individual observations (move 
the mouse over an element of that grid for a citation to 
each experiment). If the observations differ, and a curator 
has not arbitrated among the differences, then the overall 
color of the cell indicates that the observations are incon- 
sistent, and the small grid within the cell shows the indi- 
vidual observations. 

Clicking on a cell within the All Growth Media page 
produces a page describing all its chemical components 
and listing all growth observations available for that 
growth medium across all available conditions and 
mutants. For example, the medium 'MOPS medium with 
0.4% glucose' lists growth observations for 4214 single- 
gene knockouts of E. coli. 

Gene knockout data 

Gene essentiality information is useful for predicting anti- 
biotic targets for pathogenic bacteria and for guiding the 
design of minimal genomes. It provides clues regarding the 
functions of genes of unknown function. Additionally, it is 
useful for vahdating genome-scale metabolic flux models 
because those models can simulate the effects of knock- 
outs; model results are compared with the experimental 
data to assess model accuracy. We have loaded five 
high-throughput gene knockout data sets into EcoCyc 
(22-26) that include > 13 000 individual gene-knockout 
growth observations. Each growth observation is tied to 
the growth medium in which the observation was made, as 
the notion of gene essentiality depends strongly on the 
conditions under which essentiality is assessed. Gene 
knockout phenotypes are shown both on the growth 
medium page and in a table within the gene page 
(Figure 2). 

BEYOND £. coli K-12 MG1655 

The E. coli strain K-12 MG1655 was chosen for genome 
sequencing because it had undergone comparatively 
minimal genetic manipulation since its isolation; the 
completed sequence was pubhshed in 1997 (27) and 
updated in 2006 (28,29). Since then, several other 
commonly used laboratory strains, as well as many patho- 
genic and commensal strains of E. coli, have been fully 
sequenced. Because of the large number of genome 
sequences, manual curation of even a small subset of the 
resulting databases is neither feasible nor efficient. 



Table 3. Examples of renamed transport proteins in EcoCyc 



Former name 



YdeA MPS transporter 

rhamnose RhaT transporter 

GabP APC transporter 

MglB 

MglC 

MglA 

CorA magnesium ion MIT transporter 
MalX 

EmrE SMR transporter 



Revised name 



arabinose efflux transporter 
rhamnose/lyxose:H^ symporter 
4-aminobutyrate:H^ symporter 

galactose ABC transporter — periplasmic binding protein 
galactose ABC transporter — membrane subunit 
galactose ABC transporter — ATP-binding subunit 
Ni-"^/Co-"^/Mg^^ transporter 
maltose/glucose PTS permease — MalX subunit 
multidrug efflux transporter EmrE 
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Phenotype Microarray Plates: 

Plate ID' Blolog PM1 - Carbon Sources growth /respiration |Low growth/respiration | Growth /respiration |lnconsistent results |No data 



Conditions: wildtype at 37 C (aerobic); 5 Datasets; Growth: 68: Low Growth: Z: No Growth: ZO; Inconsistent results: 5. 
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Figure 1. Biolog PMl plate depicting E. coli carbon source utilization results from five different experiments under aerobic growth conditions- 



To address this problem, the Pathway Tools software now 
includes automated and manual tools for curators to 
transfer annotations from a well-curated 'master' MOD 
to orthologs in databases of its less-well curated relatives. 

To limit the likelihood of inappropriate transfer of an- 
notations, the criteria used by the automated tool are strict. 
Candidate gene pairs are identified on the basis of sequence 
orthology, defined as the best bidirectional BLAST hit. In 
addition, cutoffs for alignment quahty (BLAST P-value of 
10"'**), alignment length and synteny are enforced, and the 
presence of existing annotations that may conflict with 
transferred annotations is taken into account. Functions 
of individual orthologs that do not meet all of these 
criteria, but should nevertheless be transferred, can be 
propagated by a curator. The values copied from genes/ 
proteins in the 'master' database include the gene and 
gene product names and synonyms, heteromultimeric 
complexes, reactions catalyzed by proteins and complexes 
and GO terms with experimental evidence codes. 

We have initially transferred annotations from EcoCyc 
to orthologous genes in the BioCyc.org databases for the 
K-12 strain W3110 and the B strain REL606. Manual 
updates to orthologs in both databases are under way. 
Well-known differences between the metabolic capabihties 
of the K-12 and B strains will be captured in our current 
curation effort. 

EcoCyc METABOLIC FLUX MODEL 

The MetaFlux software generates steady-state metabolic 
flux models from pathway/genome databases (1). This 



approach ensures that updates to the database are auto- 
matically reflected in the generated model. We have 
generated (1) a FBA model for EcoCyc that can be 
executed using MetaFlux as part of the downloadable 
software/database bundle that includes Pathway Tools 
and EcoCyc; the model is also available as an SBML file 
within the EcoCyc downloadable files (http://biocyc.org/ 
download. shtml). 

The EcoCyc FBA model comprises 1888 total reactions; 
the model produces 58 biomass metabolites with 370 re- 
actions carrying non-zero flux, from a minimal medium 
that includes glucose and ammonium. We assessed the 
accuracy of the model against the growth observations 
and gene knockout data in EcoCyc. The model predicted 
growth versus no growth correctly for 72.6% of 383 
growth conditions in EcoCyc. The model predicted 
growth versus no growth for the 4207 single-gene knock- 
outs in (21) with 91.2% accuracy. 



WEB INTERFACE UPDATES: WEB GROUPS 

Web Groups are a new aspect of the EcoCyc/BioCyc web 
site that aUow users to create, store, analyze and display 
groups of genes, metabolites, pathways and other entities 
within EcoCyc. Groups can also be shared with specific 
colleagues or made fully public. Although a fuU descrip- 
tion of Web Groups is beyond the scope of this article, we 
provide here a sample use scenario for Web Groups. We 
will create a Web Group containing a set of E. coli genes 
of interest (e.g. from a gene expression experiment or from 
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GO Terms: 



Biological Process: 


GO:0000162 ■ tryptophan biosvnthetic process & — [GOAOO. Smith67bl 


Molecular Function: 


GO:0OO4425 ■ indole-3-alvcerol-phosphate synthase activity Ut — [GOAOIa, GOA01, Creiahton66l 


GO:0(XM640 - phosphoribosylanthranilate isomerase activity ^ " [GOAOIa, GOA01. Homnnel9S, Smith67b] 



MultiFun Ternns: location of gene products — cytoplasm 

metabolism — biosynthesis of building blocks — » amino acids — » tryptophan 

Essentiality data for trpC knockouts: B 



Growth Medium 


Growth? 


Growth Observations 


LB enriched 


Yes 


Yes [Gerdes03, Comment 1] 


LB Lennox 


Yes 


Yes [BabaM, Comment 2) 


M9 medium with 0,4% glucose 


No 


No [PatrickO?, Comment 3] 


M9 medium with IS glycerol 


No 


No (JovceO*) 


MOPS medium with O.A% glucose 


Conflict 


No (Feist07, Comment 4) 
Yes fBaba06, Comment 2] 





Figure 2. The section of an EcoCyc gene page that provides gene essentiality information. 



some other type of experiment), and use several tools to 
determine commonalities among that set of genes. 

To begin using Web Groups, a user must first create an 
account (groups are stored within a user's account) and 
start a Web Groups session (command Tools->Groups). 
A Groups command menu then becomes available in the 
menu bar. 

Web Groups can be created in several ways, e.g. by 
uploading the gene list from a file, or from a hst of 
search results. Most EcoCyc object pages also allow you 
to add an object (e.g. a metabolite) to an existing group. 
Once a new group is created, a single column of informa- 
tion will be shown, namely the name of each gene. 
Additional properties of these EcoCyc gene objects can 
be selected. 

One way to see whether the genes in this group are 
found in a common set of metabolic pathways is to 
create a new column in which we transform each gene to 
the pathways its product is present in by selecting the 
'Pathways of gene' transform. That column can be con- 
verted to a new group by clicking on the '+' in the column 
heading. Many other transformations are available, for 
example, a gene group can be transformed to a list of all 
genes that are regulated by genes in the group, and to a hst 
of all orthologous genes in another organism. The list of 
transformations available depends on the type of objects 
within the current group. 

Another way to investigate pathway relationships 
within a gene group is to highhght those genes on the 
EcoCyc metabolic map diagram, which we call the 
Cellular Overview. 

A final way of investigating shared relationships among 
these genes is by using enrichment analysis, which is a 
statistical technique for determining whether a set of 
entities (such as our gene list) is statistically over- 
represented for members of other known sets. For 
example, does our gene set contain more genes from a 
given metabolic pathway than we expect by chance? The 
'Enrichments' menu allows you to apply several statistical 
tests to a group; the exact tests available depend on the 
type of objects within the group. Currently, enrichment 
analyses are available for gene groups and metabolite 



groups. In addition to testing a gene group for pathway 
enrichment, tests for enrichment for GO terms are avail- 
able, as is a test for whether the genes in a group share 
regulators in common more frequently than would be 
expected by chance. Finally, you can perform all of 
these tests at once and see the results sorted by /"-value. 

More information about Web Groups is available 
through a BioCyc Webinar (http://www.biocyc.org/ 
webinar.shtml) and from the BioCyc Web site User's 
Guide (http://www.biocyc.org/BioCycUserGuide.shtml). 
For example, groups can be exported to files. We expect 
that in late 2012, set operations will be available for 
groups, and it will be possible to manipulate sequences 
using groups. 

USING EcoCyc AS A TEACHING RESOURCE 

With the goal of assisting undergraduate student learning 
of microbiology principles in large classroom settings, we 
are exploiting EcoCyc for college-level instruction. This 
approach uses web-based tutorials to orient the student 
in accessing and using EcoCyc. Student learning 
modules introduce basic microbiological principles, and 
a set of complementary student exercises is designed to 
reinforce topics covered in formal classroom lectures. 
We reason that web-based exercises can deepen student 
understanding of basic microbiology concepts and 
improve overall class performance. The web-based educa- 
tional approach also allows for independent and 
self-paced learning while increasing the depth of inquiry 
and study, which is not easily accomphshed in large class- 
room settings. 

We authored additional EcoCyc-based educational mod- 
ules and evaluated their effectiveness in an introductory- 
level microbiology lecture course of 255 students at 
UCLA in the spring of 2012. These modules describe 
basic principles of E. coli nutrient uptake, energy gener- 
ation by aerobic and anaerobic respiration, substrate-level 
phosphorylation, fermentation, genome organization, gene 
regulation and genome/organism comparison. The mater- 
ials are accessible at the newly created E. coli student portal 
web site at http://ecolistudentportal.org. 
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How the learning materials were implemented 

Following the introduction of a basic microbiological 
principle/process in class lecture, the instructor assigned 
a web-based task to each student. After performing a 
web-based research inquiry, each student answered a set 
of questions to demonstrate mastery of the assigned 
topic — for example, to identify genes/gene product rela- 
tionships for specific membrane transport systems, the 
enzyme system affinities, and specificities for substrate(s). 
The student also provided a brief statement explaining the 
rationale and approach used. Exercises were graded for 
accuracy and completeness. The major goals of each 
task in this project were to have each student demonstrate 
understanding of the class-introduced concepts, master 
use of a state-of-the-art microbial MOD and stimulate 
inquiry-based learning of a topic beyond the class lecture. 

How the materials were evaluated 

A student survey was conducted at the end of the course 
using the tools at http://www.salgsite.org/. Questions were 
designed to measure student perceptions of learning gains 
made as the result of the EcoCyc-based exercises. Student 
response rate was >91%, and the mean response and con- 
fidence intervals were determined for all replies. 

Outcomes 

For student perceptions, 97% of the students successfully 
completed aU assigned web tasks that comprised 10% of 
the final course grade. For the assigned exercises, the mean 
student scores ranged from 91 to 96%, with a low score of 
71% to a high score of 100%. This was the first class 
exposure to a web-based MOD learning experience, and a 
majority of the students were excited about using a research 
grade tool to access and analyze data regarding E. coli 
biology. Responses to the student exit poU revealed that 
the goals of the exercises were generally clear, were relevant 
to class material and reinforced learning of general micro- 
biological principles. Our instructional goal to have every 
student demonstrate proficiency in self-directed inquiry 
using the EcoCyc database was nearly achieved. 

Future directions 

We plan to author additional E. coli learning modules and 
complementary exercises in other areas of basic microbiol- 
ogy using the EcoCyc database as a web-based resource. 
Outreach to other institutions is planned to facilitate 
sharing of course materials and to assist in classroom im- 
plementation of these materials. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Onhne: 
Supplementary Tables 1 and 2 and Supplementary 
Figure 1. 
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